108 results found.
Written
Corpus,
Language Type:
Multilingual
Languages:
Chinese Czech English Finnish German Latvian Romanian Russian Turkish
Availability:
Freely Available
License:
Size:
3.9 MByte Production Status:
Existing-used
Use:
Evaluation/Validation
-
Paper title:Automatic Machine Translation Evaluation using Source Language Inputs and Cross-lingual Language Model
-
Paper track:Short/Resources and Evaluation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Kosuke Takahashi | WMT18 metrics shared task data | /N |
Documentation:
None
Written
Corpus,
Language Type:
Monolingual
Languages:
Chinese
Availability:
Freely Available
License:
MIT License
Size:
None Production Status:
Newly created-finished
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:2kenize: Tying Subword Sequences for Chinese Script Conversion
-
Paper track:Long/Phonology, Morphology and Word Segmentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | A Pranav | Simplified Chinese to Traditional Chinese Conversion | /N |
Documentation:
None
Written
Corpus,
Language Type:
Monolingual
Languages:
Chinese
Availability:
Freely Available
License:
MIT License
Size:
None Production Status:
Use:
Document Classification, Text categorisation
-
Paper title:2kenize: Tying Subword Sequences for Chinese Script Conversion
-
Paper track:Long/Phonology, Morphology and Word Segmentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | A Pranav | Traditional Chinese Topic Classification | /N |
Documentation:
None
Written
Corpus,
Language Type:
Monolingual
Languages:
Chinese
Availability:
From Owner
License:
Size:
300 MByte Production Status:
Existing-used
Use:
Document Classification, Text categorisation
-
Paper title:Dynamic Memory Induction Networks for Few-Shot Text Classification
-
Paper track:Short/Information Retrieval and Text Mining
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Ruiying Geng | Open Domain Intent Classification | /N |
Documentation:
None
Multimodal/Multimedia
Corpus,
Language Type:
Bilingual
Languages:
Chinese English
Availability:
Freely Available
License:
Size:
2281 sentences Production Status:
Newly created-finished
Use:
Multimodal Sentiment Analysis
-
Paper title:CH-SIMS: A Chinese Multimodal Sentiment Analysis Dataset with Fine-grained Annotation of Modality
-
Paper track:Long/Speech and Multimodality
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Wenmeng Yu | CH-SIMS | /N |
Documentation:
None
Multimodal/Multimedia
Corpus,
Language Type:
Monolingual
Languages:
Chinese
Availability:
From Owner
License:
NA
Size:
77.28 hours Production Status:
Newly created-in progress
Use:
Humor Prediction
-
Paper title:Predicting Humor by Learning from Time-aligned Comments
-
Paper track:3.8 Perception of paralinguistic phenomena/Poster Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Zixiaofan Yang | The Bilibili Corpus | /N |
Documentation:
NA
Speech
Corpus,
Language Type:
Monolingual
Languages:
Arabic Bengali Central Khmer Chinese Dari Egyptian Arabic English Georgian Hindi Iranian Persian Italian Japanese Korean Lao Mandarin Chinese Min Nan Chinese Moroccan Arabic Northern Khmer Panjabi Persian Russian Spanish Tagalog Thai Tigrinya Urdu Uzbek Vietnamese Wu Chinese Yue Chinese
Availability:
From Data Center(s)
License:
LDC
Size:
None Production Status:
Existing-used
Use:
Speech Recognition/Understanding
-
Paper title:End-to-End Neural Speaker Diarization with Permutation-Free Objectives
-
Paper track:4.5 Speaker diarization/Poster Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Yusuke Fujita | 2008 NIST Speaker Recognition Evaluation | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Bilingual
Languages:
Arabic Bengali Chinese English Hindi Korean Russian Thai and Urdu
Availability:
From Data Center(s)
License:
LDC
Size:
595 hours Production Status:
Existing-used
Use:
Speech Recognition/Understanding
-
Paper title:End-to-End Neural Speaker Diarization with Permutation-Free Objectives
-
Paper track:4.5 Speaker diarization/Poster Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Yusuke Fujita | 2006 NIST Speaker Recognition Evaluation Training Set | /N |
Documentation:
None
File
Corpus,
Language Type:
Bilingual
Languages:
Chinese English
Availability:
apply under contrast
License:
Size:
4.0Gbyte Production Status:
Finished
Use:
training ASR system in ATC domain
-
Paper title:ATCSpeech: a Multilingual pilot-controller Speech Corpus from Real Air Traffic Control Environment
-
Paper track:12.8 Metadata descriptions of speech, audio and te/Poster Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Yi LIN | ATCSpeech | /N |
Documentation:
None
Written
Corpus,
Language Type:
Monolingual
Languages:
Chinese
Availability:
Will be available after finished
License:
Size:
24.3 MByte Production Status:
Newly created-in progress
Use:
Emotion Recognition/Generation
-
Paper title:An Event-comment Social Media Corpus for Implicit Emotion Analysis
-
Paper track:Infrastructural Issues/Large Projects/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Helena Yan Ping Lau | Chinese Event-comment Emotion Corpus | /N |
Documentation:
None




